几十年来,人们一直在寻找可以通过低能成本实现水下推进的鱼类拍打动作。在拍打主体周围的非平稳流场的复杂性使这个问题非常困难。在较早的研究中,通常将运动模式作为某些周期函数规定,该功能在整个运动空间的小子域中约束以下优化过程。在这项工作中,为避免这种运动限制,变异自动编码器(VAE)旨在将各种拍打运动压缩为简单的动作向量。然后,我们让拍打的机翼连续与水隧道环境相互作用,并通过增强学习(RL)框架相应地调整其动作。通过这项自动闭环实验,我们获得了几种运动模式,与具有相同推力水平的纯谐波运动相比,可以导致高流体动力效率。而且我们发现,经过许多试验和错误,当前实验中的RL培训始终融合到接近谐波运动的运动模式。换句话说,当前的工作证明,具有适当幅度和频率的谐波运动始终是有效水下推进的最佳选择。此外,这里提出的RL框架也可以扩展到对其他复杂游泳问题的研究,这可能为创建可以像真正的鱼一样游泳的机器人鱼铺平了道路。
translated by 谷歌翻译
文本视频检索是一项具有巨大实际价值的任务,并受到了越来越多的关注,其中学习时空视频表示是研究热点之一。最先进的视频检索模型中的视频编码通常会直接采用预训练的视觉主链,其网络结构固定,因此无法进一步改进它们以产生细粒度的空间时间表视频表示。在本文中,我们提出了令牌移位和选择网络(TS2-NET),这是一种新型的令牌移动和选择变压器体系结构,该架构会动态调整令牌序列,并从输入视频样本中选择时间和空间维度中的信息令牌。令牌移位模块在时间上暂时移动整个代币特征,来回跨相邻帧,以保留完整的令牌表示并捕获微妙的动作。然后,令牌选择模块选择对局部空间语义贡献最大的令牌。基于彻底的实验,拟议的TS2-NET在主要文本视频检索基准上实现了最先进的性能,包括有关MSRVTT,VATEX,LSMDC,LSMDC,ActivityNetnet和DideMo的新记录。
translated by 谷歌翻译
我们介绍了一个大规模实验,该实验对编码器进行了预处理,其参数计数范围从700m到9.3b不等,随后蒸馏到较小的型号中,范围为17m-170亿参数,其应用到自然语言理解(NLU)组件(NLU)组件(虚拟助手系统。尽管我们使用70%的口语数据训练,但在对书面形式的跨语性自然语言推论(XNLI)语料库进行评估时,我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练,以提高了3.86%的相对分类,而相对7.01%的插槽填充。我们发现,即使是从我们的2阶段教师模型中提取的170亿参数模型,与仅接受公共数据的2.3B参数老师相比,与2.3B参数老师相比,意图分类更好2.88%,并且7.69%的插槽填充错误率更好(第1阶段),强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时,我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基(85m Params)和Distillbert(42m Params),分别优于4.23%至6.14%。最后,我们介绍了一个完整的虚拟助手实验平台的结果,在该平台中,我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型,在自动测量全系统用户不满的自动测量中,从8500万参数教师蒸馏出3.74%-4.91%。
translated by 谷歌翻译
对象检测在过去十年中取得了实质性进展。然而,只有少量样品检测新颖类仍然有挑战性,因为低数据制度下的深度学习通常会导致降级的特征空间。现有的作品采用整体微调范例来解决这个问题,其中模型首先在具有丰富样本的所有基类上进行预培训,然后它用于雕刻新颖的类特征空间。尽管如此,这个范例仍然不完美。微调,一个小型类可以隐含地利用多个基类的知识来构造其特征空间,它引起分散的特征空间,因此违反了级别的可分离性。为了克服这些障碍,我们提出了一系列两步的微调框架,通过关联和歧视(FADI),为每个新颖类带来了一个具有两个积分步骤的判别特征空间。 1)在关联步骤中,与隐式利用多个基类相反,我们通过显式模仿特定的基类特征空间来构造一个紧凑的新颖类别特征空间。具体地,我们根据其语义相似性将每个小组与基类联系起来。之后,新类的特征空间可以容易地模仿相关基类的良好训练的特征空间。 2)在歧视步骤中,为了确保新型类和相关基类之间的可分离性,我们解除了基础和新类的分类分支。为了进一步放大所有类之间的阶级间可分性,施加了专用的专用边缘损失。对Pascal VOC和MS-Coco Datasets的广泛实验表明FADI实现了新的SOTA性能,显着改善了任何拍摄/分裂的基线+18.7。值得注意的是,优势在极其镜头方案上最为宣布。
translated by 谷歌翻译
视觉问题应答(VQA)任务利用视觉图像和语言分析来回回答图像的文本问题。它是一个流行的研究课题,在过去十年中越来越多的现实应用。本文介绍了我们最近对AliceMind-MMU的研究(阿里巴巴的编码器 - 解码器来自Damo Academy - 多媒体理解的机器智能实验室),其比人类在VQA上获得相似甚至略微更好的结果。这是通过系统地改善VQA流水线来实现的,包括:(1)具有全面的视觉和文本特征表示的预培训; (2)与学习参加的有效跨模型互动; (3)一个新颖的知识挖掘框架,具有专门的专业专家模块,适用于复杂的VQA任务。处理不同类型的视觉问题,需要具有相应的专业知识在提高我们的VQA架构的表现方面发挥着重要作用,这取决于人力水平。进行了广泛的实验和分析,以证明新的研究工作的有效性。
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译